GLM Part5

Author

Murray Logan

Published

23/07/2023

1 Preparations

Load the necessary libraries

library(glmmTMB)       #for model fitting
library(car)           #for regression diagnostics
library(broom)         #for tidy output
library(ggfortify)     #for model diagnostics
library(DHARMa)        #for residual diagnostics
library(see)           #for plotting residuals
library(knitr)         #for kable
library(effects)       #for partial effects plots
library(ggeffects)     #for partial effects plots
library(emmeans)       #for estimating marginal means
library(modelr)        #for auxillary modelling functions
library(tidyverse)     #for data wrangling
library(lindia)        #for diagnostics of lm and glm
library(performance)   #for residuals diagnostics
library(sjPlot)        #for outputs
library(report)        #for reporting methods/results
library(easystats)     #framework for stats, modelling and visualisation
library(MASS)          #for negative binomials
library(MuMIn)         #for AICc
library(patchwork)     #for figure layouts

2 Scenario

Here is a modified example from Quinn and Keough (2002). Day and Quinn (1989) described an experiment that examined how rock surface type affected the recruitment of barnacles to a rocky shore. The experiment had a single factor, surface type, with 4 treatments or levels: algal species 1 (ALG1), algal species 2 (ALG2), naturally bare surfaces (NB) and artificially scraped bare surfaces (S). There were 5 replicate plots for each surface type and the response (dependent) variable was the number of newly recruited barnacles on each plot after 4 weeks.

Figure 1: Six-plated barnacle
Table 1: Format of day.csv data files
TREAT BARNACLE
ALG1 27
.. ..
ALG2 24
.. ..
NB 9
.. ..
S 12
.. ..
Table 2: Description of the variables in the day data file
TREAT Categorical listing of surface types. ALG1 = algal species 1, ALG2 = algal species 2, NB = naturally bare surface, S = scraped bare surface.
BARNACLE The number of newly recruited barnacles on each plot after 4 weeks.

3 Read in the data

As we are going to treat Treatment as a categorical predictor, we will specifically declare it as such straight after importing the data.

day <- read_csv("../public/data/day.csv", trim_ws = TRUE)
Rows: 20 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): TREAT
dbl (1): BARNACLE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
day <- day |> 
  mutate(TREAT = factor(TREAT))
day |> glimpse()
Rows: 20
Columns: 2
$ TREAT    <fct> ALG1, ALG1, ALG1, ALG1, ALG1, ALG2, ALG2, ALG2, ALG2, ALG2, N…
$ BARNACLE <dbl> 27, 19, 18, 23, 25, 24, 33, 27, 26, 32, 9, 13, 17, 14, 22, 12…
## Explore the first 6 rows of the data
day |> head()
# A tibble: 6 × 2
  TREAT BARNACLE
  <fct>    <dbl>
1 ALG1        27
2 ALG1        19
3 ALG1        18
4 ALG1        23
5 ALG1        25
6 ALG2        24
day |> str()
tibble [20 × 2] (S3: tbl_df/tbl/data.frame)
 $ TREAT   : Factor w/ 4 levels "ALG1","ALG2",..: 1 1 1 1 1 2 2 2 2 2 ...
 $ BARNACLE: num [1:20] 27 19 18 23 25 24 33 27 26 32 ...
day |> datawizard::data_codebook()
day (20 rows and 2 variables, 2 shown)

ID | Name     | Type        | Missings |  Values |         N
---+----------+-------------+----------+---------+----------
1  | TREAT    | categorical | 0 (0.0%) |    ALG1 | 5 (25.0%)
   |          |             |          |    ALG2 | 5 (25.0%)
   |          |             |          |      NB | 5 (25.0%)
   |          |             |          |       S | 5 (25.0%)
---+----------+-------------+----------+---------+----------
2  | BARNACLE | numeric     | 0 (0.0%) | [8, 33] |        20
------------------------------------------------------------

4 Exploratory data analysis

Model formula: \[ y_i \sim{} \mathcal{Pois}(\lambda_i)\\ \mu_i = \boldsymbol{\beta} \bf{X_i} \]

where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and treatment contrasts for the effects of Treatment on barnacle recruitment.

ggplot(day, aes(y=BARNACLE, x=TREAT)) +
    geom_boxplot() + 
    geom_point(color='red')

ggplot(day, aes(y=BARNACLE, x=TREAT)) +
    geom_violin() +
    geom_point(color='red')

ggplot(day, aes(y=BARNACLE, x=TREAT)) +
    geom_violindot(dots_size = 10) 

5 Fit the model

Categorical predictors are actually treated as multiple regression. We start by dummy coding the levels into separate vectors containing only 0’s and 1’s. The following fabricated example should illustrate this process. The first column represent a response (the numbers are not important for the dummy coding and thus they are excluded from the example). Column X represents a categorical variable with three levels (A, B and C). This could be dummy coded into three dummies (one for each level of the categorical predictor).

Y X Dummy 1 Dummy 2 Dummy 3
. A 1 0 0
. A 1 0 0
. B 0 1 0
. B 0 1 0
. C 0 0 1
. C 0 0 1

So, we could construct the linear predictor of a regression model as:

\[ log(\mu_i) = \beta_0 + \beta_1 D1 + \beta_2 D2 + \beta_3 D3 \]

however, this would be an overparameterized model as there are four effects parameters to estimate from only three chunks of data (one per level of the categorical predictor).

One option would be to drop the intercept:

\[ log(\mu_i) = \beta_1 D1 + \beta_2 D2 + \beta_3 D3 \] Now each \(\beta\) just represents the mean of each level of the categorical predictor. Although this is perfectly valid, it is not a common way to express the model as it does not include any effects (differences).

Another option is to re-parameterise this model such that the intercept represents the mean of the first level of the categorical predictor and then two additional \(\beta\) parameters represent the differences between the first level and each of the remaining levels. This parameterisation is called treatment contrasts.

When we fit treatment contrasts, the first dummy variable is populated with 1’s.

Y X Dummy 1 Dummy 2 Dummy 3
. A 1 0 0
. A 1 0 0
. B 1 1 0
. B 1 1 0
. C 1 0 1
. C 1 0 1

\[ log(\mu_i) = \beta_0 + \beta_2 D2 + \beta_3 D3 \]

Before actually fitting the model for our actual data, it might be instructive to estimate the effects via matrix multiplication so as to see the treatment contrasts in action. Note, this approach is essentially Ordinary Least Squares regression, as and as such, assumes normality and homogeneity of variance. As the response observations are counts, a Gaussian distribution might bot be the most appropriate choice. Nevertheless, it will suffice for the purpose of illustration.

#Effects model
## start by dummy coding the categorical predictor and expressing
## the treatment effects as a model matrix.
Xmat <- model.matrix(~TREAT, data=day)
Xmat |> head()
  (Intercept) TREATALG2 TREATNB TREATS
1           1         0       0      0
2           1         0       0      0
3           1         0       0      0
4           1         0       0      0
5           1         0       0      0
6           1         1       0      0
##latex-math-preview-expression
# solve(X'X)X'Y
solve(t(Xmat) %*% Xmat) %*% t(Xmat) %*% day$BARNACLE
            [,1]
(Intercept) 22.4
TREATALG2    6.0
TREATNB     -7.4
TREATS      -9.2

Now lets fit the model. As the response observations are counts, a Poisson distribution would be an obvious choice for the model distribution. That said, it might be useful to fit the model with a Gaussian distribution with an Identity link (no transformation) to assist us in interpreting the estimated coefficients. TO emphasise, the Gaussian model will only be used as a learning aid, as a model, it does not have as much merit as a Poisson model.

day_glm <- glm(BARNACLE ~ TREAT, data = day, family = 'gaussian')
day_glm1 <- glm(BARNACLE ~ TREAT, data = day, family = 'poisson')
day_glm1 <- glm(BARNACLE ~ TREAT, data = day, family = poisson(link = "log"))

6 Model validation

day_glm |> autoplot()

Conclusion:

  • there is not much alarming here
  • note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
day_glm1 |> autoplot(which = 1:6)

Conclusion:

  • there is not much alarming here
  • note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
day_glm |> influence.measures()
Influence measures of
     glm(formula = BARNACLE ~ TREAT, family = "gaussian", data = day) :

      dfb.1_ dfb.TREATA dfb.TREATN dfb.TREATS   dffit cov.r  cook.d hat inf
1   6.05e-01  -4.28e-01  -4.28e-01  -4.28e-01  0.6053 1.115 0.08900 0.2    
2  -4.38e-01   3.10e-01   3.10e-01   3.10e-01 -0.4378 1.326 0.04862 0.2    
3  -5.77e-01   4.08e-01   4.08e-01   4.08e-01 -0.5766 1.152 0.08143 0.2    
4   7.54e-02  -5.33e-02  -5.33e-02  -5.33e-02  0.0754 1.608 0.00151 0.2    
5   3.31e-01  -2.34e-01  -2.34e-01  -2.34e-01  0.3313 1.442 0.02843 0.2    
6   7.16e-17  -4.08e-01  -5.34e-17  -4.47e-17 -0.5766 1.152 0.08143 0.2    
7  -7.51e-17   4.28e-01   5.60e-17   4.70e-17  0.6053 1.115 0.08900 0.2    
8   2.19e-17  -1.25e-01  -1.63e-17  -1.37e-17 -0.1766 1.565 0.00824 0.2    
9   3.79e-17  -2.16e-01  -2.82e-17  -2.37e-17 -0.3051 1.467 0.02423 0.2    
10 -5.77e-17   3.29e-01   4.30e-17   3.61e-17  0.4650 1.293 0.05451 0.2    
11  1.47e-17  -4.15e-17  -5.78e-01   0.00e+00 -0.8180 0.839 0.15141 0.2    
12  4.54e-18  -1.28e-17  -1.79e-01   0.00e+00 -0.2533 1.512 0.01682 0.2    
13 -4.54e-18   1.28e-17   1.79e-01   0.00e+00  0.2533 1.512 0.01682 0.2    
14  2.25e-18  -6.38e-18  -8.90e-02   0.00e+00 -0.1259 1.591 0.00421 0.2    
15 -1.77e-17   5.00e-17   6.98e-01   0.00e+00  0.9866 0.643 0.20609 0.2    
16 -1.12e-18  -4.95e-18   8.12e-18  -1.07e-01 -0.1512 1.579 0.00606 0.2    
17 -5.15e-18  -2.27e-17   3.73e-17  -4.91e-01 -0.6937 0.998 0.11373 0.2    
18  1.69e-18   7.46e-18  -1.22e-17   1.61e-01  0.2276 1.532 0.01363 0.2    
19  7.06e-18   3.12e-17  -5.11e-17   6.73e-01  0.9515 0.681 0.19448 0.2    
20 -2.07e-18  -9.14e-18   1.50e-17  -1.97e-01 -0.2791 1.490 0.02036 0.2    
day_glm1 |> influence.measures()
Influence measures of
     glm(formula = BARNACLE ~ TREAT, family = poisson(link = "log"),      data = day) :

      dfb.1_ dfb.TREATA dfb.TREATN dfb.TREATS   dffit cov.r  cook.d hat inf
1   5.08e-01  -3.80e-01  -3.22e-01  -3.09e-01  0.5078 1.240 0.07380 0.2    
2  -3.93e-01   2.94e-01   2.49e-01   2.39e-01 -0.3929 1.377 0.04032 0.2    
3  -5.20e-01   3.89e-01   3.30e-01   3.17e-01 -0.5203 1.224 0.06752 0.2    
4   6.59e-02  -4.93e-02  -4.17e-02  -4.01e-02  0.0659 1.611 0.00126 0.2    
5   2.84e-01  -2.13e-01  -1.80e-01  -1.73e-01  0.2844 1.486 0.02358 0.2    
6  -3.01e-17  -3.02e-01   2.01e-17   3.53e-17 -0.4548 1.305 0.05326 0.2    
7   2.98e-17   2.99e-01  -1.99e-17  -3.50e-17  0.4508 1.310 0.05821 0.2    
8  -9.16e-18  -9.20e-02   6.13e-18   1.08e-17 -0.1386 1.585 0.00539 0.2    
9  -1.59e-17  -1.60e-01   1.06e-17   1.86e-17 -0.2403 1.522 0.01585 0.2    
10  2.32e-17   2.33e-01  -1.55e-17  -2.72e-17  0.3511 1.422 0.03565 0.2    
11 -2.42e-17   5.04e-17  -7.58e-01   3.04e-17 -0.9794 0.651 0.18750 0.2    
12 -6.90e-18   1.43e-17  -2.16e-01   8.65e-18 -0.2787 1.491 0.02083 0.2    
13  6.59e-18  -1.37e-17   2.06e-01  -8.26e-18  0.2663 1.501 0.02083 0.2    
14 -3.38e-18   7.04e-18  -1.06e-01   4.24e-18 -0.1366 1.586 0.00521 0.2    
15  2.45e-17  -5.10e-17   7.66e-01  -3.07e-17  0.9896 0.640 0.25521 0.2    
16 -1.10e-18  -3.52e-18  -1.01e-17  -1.39e-01 -0.1758 1.566 0.00852 0.2    
17 -5.54e-18  -1.78e-17  -5.08e-17  -7.03e-01 -0.8869 0.756 0.16004 0.2    
18  1.59e-18   5.11e-18   1.46e-17   2.02e-01  0.2552 1.511 0.01918 0.2    
19  6.41e-18   2.06e-17   5.88e-17   8.14e-01  1.0265 0.601 0.27367 0.2    
20 -2.06e-18  -6.62e-18  -1.89e-17  -2.62e-01 -0.3301 1.443 0.02865 0.2    
day_glm |> performance::check_model()
Not enough model terms in the conditional part of the model to check for
  multicollinearity.

Conclusion:

  • there is not much alarming here
  • note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
day_glm1 |> performance::check_model()
Not enough model terms in the conditional part of the model to check for
  multicollinearity.

Conclusion:

  • there is not much alarming here
  • note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
day.resids <- day_glm |> simulateResiduals(plot=TRUE)

day.resids |> testDispersion()


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.84683, p-value = 0.68
alternative hypothesis: two.sided

Conclusion:

  • there is not much alarming here
day.resids <- day_glm1 |> simulateResiduals(plot=TRUE)

day.resids |> testDispersion()


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.81979, p-value = 0.68
alternative hypothesis: two.sided

Conclusion:

  • there is not much alarming here
## overdispersion
1-pchisq(day_glm$deviance, day_glm$df.residual)
[1] 0
day_glm$deviance/day_glm$df.residual
[1] 18.575

Conclusion:

  • this is of little value as it is not really possible for a Gaussian model to be over-dispersed. Lets ignore this…
1-pchisq(day_glm1$deviance, day_glm1$df.residual)
[1] 0.3719059
day_glm1$deviance/day_glm1$df.residual
[1] 1.075851

Conclusion:

  • there is no evidence for a lack of model fit
  • the estimated dispersion parameter is close to 1 suggesting no over-dispersion.

Conclusion:

  • the various diagnostics suggest that the Poisson model is likely to be reliable.
  • on the basis of the diagnostics, the Gaussian model would also be reliable. Nevertheless, we will only pursue the Gaussian model for the purpose of illustrating and interpreting the treatment effects and not because it is an appropriate model.

7 Partial plots

day_glm |> plot_model(type='eff', show.data=TRUE)
$TREAT

day_glm1 |> plot_model(type='eff', show.data=TRUE)
$TREAT

day_glm |> allEffects() |> plot()

day_glm1 |> allEffects() |> plot()

allEffects(day_glm1, transformation=NULL) |> plot()

Predicted values are always on the response scale

day_glm |> ggpredict() |> plot(add.data=TRUE, jitter=FALSE)
$TREAT

day_glm1 |> ggpredict() |> plot(add.data=TRUE, jitter=FALSE)
$TREAT

## back.transform only applies to response transformations, not link functions
## day_glm1 |> ggpredict(back.transform = FALSE) |> plot()

Predicted values are always on the response scale

day_glm |> ggemmeans(~TREAT) |> plot(add.data=TRUE, jitter=FALSE)

day_glm1 |> ggemmeans(~TREAT) |> plot(add.data=TRUE, jitter=FALSE)

## back.transform only applies to response transformations, not link functions
## day_glm1 |> ggemmeans(back.transform = FALSE) |> plot()
day_glm |> estimate_relation() |> plot()

day_glm |> estimate_means() |> 
  plot()
We selected `at = c("TREAT")`.

day_glm1 |> estimate_relation() |> plot()

day_glm1 |> estimate_means() |>
  plot()
We selected `at = c("TREAT")`.
Error: Discrete value supplied to continuous scale

8 Model investigation / hypothesis testing

day_glm |> summary()

Call:
glm(formula = BARNACLE ~ TREAT, family = "gaussian", data = day)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   22.400      1.927  11.622 3.27e-09 ***
TREATALG2      6.000      2.726   2.201  0.04275 *  
TREATNB       -7.400      2.726  -2.715  0.01530 *  
TREATS        -9.200      2.726  -3.375  0.00386 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 18.575)

    Null deviance: 1033.8  on 19  degrees of freedom
Residual deviance:  297.2  on 16  degrees of freedom
AIC: 120.73

Number of Fisher Scoring iterations: 2

Conclusions:

  • the estimated y-intercept (mean number of newly recruited barnacles in the ALG1 group) is 22.4.
  • there is expected to be 6 more newly recruited barnacles on the ALG2 substrate than the ALG1 substrate.
  • there is expected to be -7.4 and -9.2 fewer newly recruited barnacles on the NB and S substrates respectively than the ALG1 substrate.
day_glm |> confint()
Waiting for profiling to be done...
                 2.5 %    97.5 %
(Intercept)  18.622300 26.177700
TREATALG2     0.657525 11.342475
TREATNB     -12.742475 -2.057525
TREATS      -14.542475 -3.857525
day_glm |> tidy(conf.int=TRUE)
# A tibble: 4 × 7
  term        estimate std.error statistic       p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>         <dbl>    <dbl>     <dbl>
1 (Intercept)    22.4       1.93     11.6  0.00000000327   18.6       26.2 
2 TREATALG2       6         2.73      2.20 0.0427           0.658     11.3 
3 TREATNB        -7.4       2.73     -2.71 0.0153         -12.7       -2.06
4 TREATS         -9.2       2.73     -3.38 0.00386        -14.5       -3.86
# warning this is only appropriate for html output
day_glm |> sjPlot::tab_model(show.se=TRUE, show.aic=TRUE)
  BARNACLE
Predictors Estimates std. Error CI p
(Intercept) 22.40 1.93 18.62 – 26.18 <0.001
TREAT [ALG2] 6.00 2.73 0.66 – 11.34 0.028
TREAT [NB] -7.40 2.73 -12.74 – -2.06 0.007
TREAT [S] -9.20 2.73 -14.54 – -3.86 0.001
Observations 20
R2 0.713
AIC 120.731
day_glm |> parameters() |> display()
Parameter Coefficient SE 95% CI t(16) p
(Intercept) 22.40 1.93 (18.62, 26.18) 11.62 < .001
TREAT (ALG2) 6.00 2.73 (0.66, 11.34) 2.20 0.028
TREAT (NB) -7.40 2.73 (-12.74, -2.06) -2.71 0.007
TREAT (S) -9.20 2.73 (-14.54, -3.86) -3.38 < .001
day_glm |> parameters() |> plot()

day_glm |>
  standardise_parameters(method = "basic") |> 
  equivalence_test() |> plot()

day_glm |>
  equivalence_test(range =  c(-0.1,0.1) * sd(day$BARNACLE)) |>
  plot()

day_glm1 |> summary()

Call:
glm(formula = BARNACLE ~ TREAT, family = poisson(link = "log"), 
    data = day)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.10906    0.09449  32.903  < 2e-16 ***
TREATALG2    0.23733    0.12638   1.878 0.060387 .  
TREATNB     -0.40101    0.14920  -2.688 0.007195 ** 
TREATS      -0.52884    0.15518  -3.408 0.000654 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 54.123  on 19  degrees of freedom
Residual deviance: 17.214  on 16  degrees of freedom
AIC: 120.34

Number of Fisher Scoring iterations: 4

Conclusions:

  • the estimated y-intercept (mean number of newly recruited barnacles in the ALG1 group) is 3.11 (on the link scale).
  • if however, we exponentiate it (to back-transform it onto the response scale), then the expected number of newly recruited barnacles on the ALG1 substrate is 22.4
  • there is expected to be 0.24 (link scale) more newly recruited barnacles on the ALG2 substrate than the ALG1 substrate.
  • when back-transformed (via exponentiation), this equates to a 1.27 fold increase in the number of newly recruited barnacles from ALG1 to ALG2 substrate. When expressed as a percentage, this equates to a 27% increase.
  • similarly, there is expected to be -0.4 and -0.4 fewer newly recruited barnacles on the NB and S substrates respectively than the ALG1 substrate. These equate to 0.67 (-33%) and 0.59 (-41%) fold declines respectively.
day_glm1 |> confint()
Waiting for profiling to be done...
                   2.5 %     97.5 %
(Intercept)  2.917958584  3.2887098
TREATALG2   -0.009471895  0.4865512
TREATNB     -0.696806649 -0.1108759
TREATS      -0.837588039 -0.2280991
day_glm1 |> confint() |> exp()
Waiting for profiling to be done...
                 2.5 %     97.5 %
(Intercept) 18.5034756 26.8082525
TREATALG2    0.9905728  1.6266964
TREATNB      0.4981736  0.8950498
TREATS       0.4327530  0.7960454
day_glm1 |> tidy(conf.int=TRUE)
# A tibble: 4 × 7
  term        estimate std.error statistic   p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)    3.11     0.0945     32.9  1.98e-237  2.92        3.29 
2 TREATALG2      0.237    0.126       1.88 6.04e-  2 -0.00947     0.487
3 TREATNB       -0.401    0.149      -2.69 7.20e-  3 -0.697      -0.111
4 TREATS        -0.529    0.155      -3.41 6.54e-  4 -0.838      -0.228
day_glm1 |> tidy(conf.int=TRUE, exponentiate=TRUE) 
# A tibble: 4 × 7
  term        estimate std.error statistic   p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)   22.4      0.0945     32.9  1.98e-237   18.5      26.8  
2 TREATALG2      1.27     0.126       1.88 6.04e-  2    0.991     1.63 
3 TREATNB        0.670    0.149      -2.69 7.20e-  3    0.498     0.895
4 TREATS         0.589    0.155      -3.41 6.54e-  4    0.433     0.796
day_glm1 |> tidy(conf.int=TRUE, exponentiate=TRUE) |>
  kable()
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 22.4000000 0.0944911 32.903208 0.0000000 18.5034756 26.8082525
TREATALG2 1.2678571 0.1263757 1.877957 0.0603870 0.9905728 1.6266964
TREATNB 0.6696429 0.1492042 -2.687664 0.0071954 0.4981736 0.8950498
TREATS 0.5892857 0.1551776 -3.407993 0.0006544 0.4327530 0.7960454
# warning this is only appropriate for html output
day_glm1 |> sjPlot::tab_model(show.se=TRUE,show.aic=TRUE)
  BARNACLE
Predictors Incidence Rate Ratios std. Error CI p
(Intercept) 22.40 2.12 18.50 – 26.81 <0.001
TREAT [ALG2] 1.27 0.16 0.99 – 1.63 0.060
TREAT [NB] 0.67 0.10 0.50 – 0.90 0.007
TREAT [S] 0.59 0.09 0.43 – 0.80 0.001
Observations 20
R2 Nagelkerke 0.902
AIC 120.340
day_glm1 |> parameters() |> display()
Parameter Log-Mean SE 95% CI z p
(Intercept) 3.11 0.09 (2.92, 3.29) 32.90 < .001
TREAT (ALG2) 0.24 0.13 (-9.47e-03, 0.49) 1.88 0.060
TREAT (NB) -0.40 0.15 (-0.70, -0.11) -2.69 0.007
TREAT (S) -0.53 0.16 (-0.84, -0.23) -3.41 < .001
day_glm1 |> parameters(exponentiate = TRUE) |> display()
Parameter IRR SE 95% CI z p
(Intercept) 22.40 2.12 (18.50, 26.81) 32.90 < .001
TREAT (ALG2) 1.27 0.16 (0.99, 1.63) 1.88 0.060
TREAT (NB) 0.67 0.10 (0.50, 0.90) -2.69 0.007
TREAT (S) 0.59 0.09 (0.43, 0.80) -3.41 < .001
day_glm1 |> parameters(exponentiate = TRUE) |> plot()

day_glm1 |>
  standardise_parameters(method = "basic") |> 
  equivalence_test() |> plot()

day_glm1 |>
  equivalence_test(range =  c(-0.1,0.1) * sd(log(day$BARNACLE))) |>
  plot()

9 Predictions

## Comparisons on a fractional scale
day_glm1 |> emmeans(~TREAT, type='response') |> pairs()
 contrast    ratio     SE  df null z.ratio p.value
 ALG1 / ALG2 0.789 0.0997 Inf    1  -1.878  0.2375
 ALG1 / NB   1.493 0.2228 Inf    1   2.688  0.0362
 ALG1 / S    1.697 0.2633 Inf    1   3.408  0.0037
 ALG2 / NB   1.893 0.2703 Inf    1   4.472  <.0001
 ALG2 / S    2.152 0.3205 Inf    1   5.143  <.0001
 NB / S      1.136 0.1918 Inf    1   0.757  0.8735

P value adjustment: tukey method for comparing a family of 4 estimates 
Tests are performed on the log scale 
## Including confidence intervals
day_glm1 |> emmeans(~TREAT, type='response') |> pairs() |> confint()
 contrast    ratio     SE  df asymp.LCL asymp.UCL
 ALG1 / ALG2 0.789 0.0997 Inf     0.570      1.09
 ALG1 / NB   1.493 0.2228 Inf     1.018      2.19
 ALG1 / S    1.697 0.2633 Inf     1.139      2.53
 ALG2 / NB   1.893 0.2703 Inf     1.312      2.73
 ALG2 / S    2.152 0.3205 Inf     1.467      3.15
 NB / S      1.136 0.1918 Inf     0.737      1.75

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates 
Intervals are back-transformed from the log scale 
## Confidence intervals and p-values
day_glm1 |> emmeans(~TREAT, type='response') |> pairs() |> summary(infer=TRUE)
 contrast    ratio     SE  df asymp.LCL asymp.UCL null z.ratio p.value
 ALG1 / ALG2 0.789 0.0997 Inf     0.570      1.09    1  -1.878  0.2375
 ALG1 / NB   1.493 0.2228 Inf     1.018      2.19    1   2.688  0.0362
 ALG1 / S    1.697 0.2633 Inf     1.139      2.53    1   3.408  0.0037
 ALG2 / NB   1.893 0.2703 Inf     1.312      2.73    1   4.472  <.0001
 ALG2 / S    2.152 0.3205 Inf     1.467      3.15    1   5.143  <.0001
 NB / S      1.136 0.1918 Inf     0.737      1.75    1   0.757  0.8735

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates 
Intervals are back-transformed from the log scale 
P value adjustment: tukey method for comparing a family of 4 estimates 
Tests are performed on the log scale 
## Lets store this for late

day.pairwise <- day_glm1 |>
    emmeans(~TREAT, type='response') |>
    pairs() |>
    summary(infer=TRUE) |>
    as.data.frame()

## Comparisons on an absolute scale
day_glm1 |> emmeans(~TREAT) |> regrid() |> pairs() |> summary(infer=TRUE)
 contrast    estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 ALG1 - ALG2     -6.0 3.19 Inf   -14.189      2.19  -1.882  0.2356
 ALG1 - NB        7.4 2.73 Inf     0.374     14.43   2.706  0.0344
 ALG1 - S         9.2 2.67 Inf     2.345     16.06   3.448  0.0032
 ALG2 - NB       13.4 2.95 Inf     5.831     20.97   4.548  <.0001
 ALG2 - S        15.2 2.88 Inf     7.790     22.61   5.270  <.0001
 NB - S           1.8 2.37 Inf    -4.301      7.90   0.758  0.8733

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates 
P value adjustment: tukey method for comparing a family of 4 estimates 

Conclusions:

  • the contrasts section of the output indicates that there is evidence of:
  • ALG1 has 1.49 fold (0.49) more newly recruited barnacles than the NB substrate
  • ALG1 has 1.7 fold (0.7) more newly recruited barnacles than the S substrate
  • ALG2 has 1.89 fold (0.89) more newly recruited barnacles than the NB substrate
  • ALG2 has 2.15 fold (1.15) more newly recruited barnacles than the S substrate
  • ALG1 was not found to be different to ALG2 and NB was not found to be different to S
## Comparisons on a link scale
day_glm1 |> estimate_contrasts(transform = "link", p_adjust = "tukey")
No variable was specified for contrast estimation. Selecting `contrast = "TREAT"`.
Marginal Contrasts Analysis

Level1 | Level2 | Difference |        95% CI |   SE |  df |     z |      p
--------------------------------------------------------------------------
ALG1   |   ALG2 |      -0.24 | [-0.56, 0.09] | 0.13 | Inf | -1.88 | 0.238 
ALG1   |     NB |       0.40 | [ 0.02, 0.78] | 0.15 | Inf |  2.69 | 0.036 
ALG1   |      S |       0.53 | [ 0.13, 0.93] | 0.16 | Inf |  3.41 | 0.004 
ALG2   |     NB |       0.64 | [ 0.27, 1.01] | 0.14 | Inf |  4.47 | < .001
ALG2   |      S |       0.77 | [ 0.38, 1.15] | 0.15 | Inf |  5.14 | < .001
NB     |      S |       0.13 | [-0.31, 0.56] | 0.17 | Inf |  0.76 | 0.874 

Marginal contrasts estimated at TREAT
p-value adjustment method: Tukey
## Comparisons on a fractional scale
day_glm1 |> estimate_contrasts(transform = "none", p_adjust = "tukey")
No variable was specified for contrast estimation. Selecting `contrast = "TREAT"`.
Marginal Contrasts Analysis

Level1 | Level2 | Difference |        95% CI |   SE |  df |     z |      p
--------------------------------------------------------------------------
ALG1   |   ALG2 |      -0.24 | [-0.56, 0.09] | 0.13 | Inf | -1.88 | 0.238 
ALG1   |     NB |       0.40 | [ 0.02, 0.78] | 0.15 | Inf |  2.69 | 0.036 
ALG1   |      S |       0.53 | [ 0.13, 0.93] | 0.16 | Inf |  3.41 | 0.004 
ALG2   |     NB |       0.64 | [ 0.27, 1.01] | 0.14 | Inf |  4.47 | < .001
ALG2   |      S |       0.77 | [ 0.38, 1.15] | 0.15 | Inf |  5.14 | < .001
NB     |      S |       0.13 | [-0.31, 0.56] | 0.17 | Inf |  0.76 | 0.874 

Marginal contrasts estimated at TREAT
p-value adjustment method: Tukey
## Comparisons on a absolute scale
## does not seem possible yet

Define your own

Compare:

  1. ALG1 vs ALG2
  2. NB vs S
  3. average of ALG1+ALG2 vs NB+S
Levels Alg1 vs Alg2 NB vs S Alg vs Bare
Alg1 1 0 0.5
Alg2 -1 0 0.5
NB 0 1 -0.5
S 0 -1 -0.5
##      Alg1_Alg2 NB_S Alg_Bare
## ALG1         1    0      0.5
## ALG2        -1    0      0.5
## NB           0    1     -0.5
## S            0   -1     -0.5
cmat<-(cbind('Alg1_Alg2'=c(1,-1,0,0),
              'NB_S'=c(0,0,1,-1),
             'Alg_Bare'=c(0.5,0.5,-0.5,-0.5)))
cmat
     Alg1_Alg2 NB_S Alg_Bare
[1,]         1    0      0.5
[2,]        -1    0      0.5
[3,]         0    1     -0.5
[4,]         0   -1     -0.5

It is important that each of the comparisons are independent of one another. We can test this by exploring the cross products of the contrast matrix. Independent comparisons will have a cross product of 0. So if all of the values in the lower (and equivalently, the upper) triangle of the cross product matrix are 0, then all comparisons are independent. Those that are not 0, indicate a lack of independence and a need to remove one of the comparisons from the set of contrasts. Note, we can ignore the diagonal values.

crossprod(cmat)
          Alg1_Alg2 NB_S Alg_Bare
Alg1_Alg2         2    0        0
NB_S              0    2        0
Alg_Bare          0    0        1

Conclusions:

  • all comparisons are independent
day_glm1 |> emmeans(~TREAT, contr=list(TREAT=cmat), type='response')
$emmeans
 TREAT rate   SE  df asymp.LCL asymp.UCL
 ALG1  22.4 2.12 Inf      18.6      27.0
 ALG2  28.4 2.38 Inf      24.1      33.5
 NB    15.0 1.73 Inf      12.0      18.8
 S     13.2 1.62 Inf      10.4      16.8

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

$contrasts
 contrast        ratio     SE  df null z.ratio p.value
 TREAT.Alg1_Alg2 0.789 0.0997 Inf    1  -1.878  0.0604
 TREAT.NB_S      1.136 0.1918 Inf    1   0.757  0.4488
 TREAT.Alg_Bare  1.792 0.1890 Inf    1   5.536  <.0001

Tests are performed on the log scale 
day_glm1 |> emmeans(~TREAT, type='response') |>
    contrast(method=list(TREAT=cmat)) |>
    summary(infer=TRUE)
 contrast        ratio     SE  df asymp.LCL asymp.UCL null z.ratio p.value
 TREAT.Alg1_Alg2 0.789 0.0997 Inf     0.616      1.01    1  -1.878  0.0604
 TREAT.NB_S      1.136 0.1918 Inf     0.816      1.58    1   0.757  0.4488
 TREAT.Alg_Bare  1.792 0.1890 Inf     1.458      2.20    1   5.536  <.0001

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 
Tests are performed on the log scale 
## what about absolute differences
day_glm1 |> emmeans(~TREAT, type='link') |>
    regrid() |>
    contrast(method=list(TREAT=cmat)) |>
    summary(infer=TRUE)
 contrast        estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 TREAT.Alg1_Alg2     -6.0 3.19 Inf    -12.25     0.247  -1.882  0.0598
 TREAT.NB_S           1.8 2.37 Inf     -2.85     6.455   0.758  0.4485
 TREAT.Alg_Bare      11.3 1.99 Inf      7.40    15.195   5.686  <.0001

Confidence level used: 0.95 

Conclusions:

  • the number of newly recruited barnacles was not found to be significantly different between ALG1 and ALG2 substrates
  • the number of newly recruited barnacles was not found to be significantly different between NB and S substrates
  • the number of newly recruited barnacles was found to be significantly higher in the algael substrates than the bare substrates
    • the algael substrates attracted 1.79 fold (79%) more newly recruited barnacles than the bare substrates.

If you want to express the comparisons on an absolute scale…

 contrast        estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 TREAT.Alg1_Alg2     -6.0 3.19 Inf    -12.25     0.247  -1.882  0.0598
 TREAT.NB_S           1.8 2.37 Inf     -2.85     6.455   0.758  0.4485
 TREAT.Alg_Bare      11.3 1.99 Inf      7.40    15.195   5.686  <.0001

Confidence level used: 0.95 
day_glm1 |>
  estimate_contrasts(contrast = "TREAT", method = list(TREAT = cmat)) |>
  display()
Error in names(level_cols) <- c("Level1", "Level2"): 'names' attribute [2] must be the same length as the vector [1]

10 Summary figures

newdata <- day_glm1 |> emmeans(~TREAT, type='response') |>
    as.data.frame()
newdata
 TREAT rate       SE  df asymp.LCL asymp.UCL
 ALG1  22.4 2.116601 Inf  18.61303  26.95746
 ALG2  28.4 2.383275 Inf  24.09279  33.47724
 NB    15.0 1.732050 Inf  11.96198  18.80960
 S     13.2 1.624807 Inf  10.37047  16.80156

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 
## A quick version
ggplot(newdata, aes(y=rate, x=TREAT)) +
    geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL)) +
    theme_classic()

A more thorough version that also includes the Tukey’s test

g1 <- ggplot(newdata, aes(y=rate, x=TREAT)) +
    geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL))+
    geom_point()+
    scale_x_discrete('Treatment', breaks=c('ALG1','ALG2','NB','S'),
       labels=c('Algae spp 1', 'Algae spp 2', 'Naturally bare', 'Scraped bare'))+
    scale_y_continuous(expression(Number~of~newly~recruited~barnacles~(cm^2)))+
    theme_classic()

g2 <- day.pairwise |>
    ggplot(aes(y=ratio,  x=contrast)) +
    geom_vline(xintercept=1,  linetype='dashed') +
    geom_pointrange(aes(xmin=asymp.LCL,  xmax=asymp.UCL)) +
    scale_y_continuous(trans=scales::log2_trans(),
                       breaks=scales::breaks_log(base=2)) +
    theme_classic()

g2 <- day.pairwise |>
  ggplot(aes(y=ratio,  x=contrast)) +
  geom_hline(yintercept=1,  linetype='dashed') +
  geom_pointrange(aes(ymin=asymp.LCL,  ymax=asymp.UCL)) +
  scale_y_continuous(trans=scales::log2_trans(), breaks=scales::breaks_log(base=2)) +
  coord_flip(ylim=c(0.25, 4)) +
  theme_classic()

g1 + g2

newdata = with(day,  data.frame(TREAT=levels(TREAT)))
Xmat <- model.matrix(~TREAT,  newdata)
coefs <- coef(day_glm1)

fit <- as.vector(coefs %*% t(Xmat))
se <- sqrt(diag(Xmat %*% vcov(day_glm1) %*% t(Xmat)))
q <- qt(0.975, df=df.residual(day_glm1))
newdata = newdata |>
  mutate(Fit = exp(fit),
         Lower=exp(fit - q*se),
         Upper=exp(fit+q*se))
## A quick version
ggplot(newdata, aes(y=Fit, x=TREAT)) +
    geom_pointrange(aes(ymin=Lower, ymax=Upper)) +
    theme_classic()

A more thorough version that also includes the Tukey’s test

g1 <- ggplot(newdata, aes(y=Fit, x=TREAT)) +
    geom_pointrange(aes(ymin=Lower, ymax=Upper))+
    geom_point()+
    scale_x_discrete('Treatment', breaks=c('ALG1','ALG2','NB','S'),
       labels=c('Algae spp 1', 'Algae spp 2', 'Naturally bare', 'Scraped bare'))+
    scale_y_continuous(expression(Number~of~newly~recruited~barnacles~(cm^2)))+
    theme_classic()

g2 <- day.pairwise |>
  ggplot(aes(y=ratio,  x=contrast)) +
  geom_hline(yintercept=1,  linetype='dashed') +
  geom_pointrange(aes(ymin=asymp.LCL,  ymax=asymp.UCL)) +
  scale_y_continuous(trans=scales::log2_trans(), breaks=scales::breaks_log(base=2)) +
  coord_flip(ylim=c(0.25, 4)) +
  theme_classic()

g1 + g2

11 References

Quinn, G. P., and K. J. Keough. 2002. Experimental Design and Data Analysis for Biologists. London: Cambridge University Press.